Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 143% (1.43x) speedup for GeoCoordinate._to_dict in weaviate/collections/classes/types.py

⏱️ Runtime : 1.27 milliseconds 524 microseconds (best of 166 runs)

📝 Explanation and details

The optimization replaces Pydantic's model_dump(exclude_none=True) method with a direct dictionary comprehension that filters out None values from self.__dict__. This achieves a 142% speedup by eliminating the overhead of Pydantic's serialization machinery.

Key changes:

  • Direct attribute access: {k: v for k, v in self.__dict__.items() if v is not None} bypasses Pydantic's model dumping process
  • Simpler filtering: Uses native Python dictionary comprehension instead of Pydantic's exclude_none=True parameter processing

Why this is faster:

  • Pydantic's model_dump() involves field validation, type conversion, and complex serialization logic that adds ~2.5x overhead
  • Dictionary comprehension with direct __dict__ access is a lightweight operation that maps directly to the desired output
  • Since GeoCoordinate has simple float fields (latitude and longitude) with Pydantic Field constraints ensuring they're always present and valid, the direct approach is safe

Performance characteristics:
The optimization shows consistent 130-360% speedups across all test cases, with particularly strong gains on:

  • High-precision floats (352% faster)
  • Extended classes with additional fields (315% faster)
  • Basic coordinate operations (155-310% faster)
  • Large batch processing (139-142% faster)

This optimization is ideal for high-frequency coordinate processing where the _to_dict() method is called repeatedly, as it eliminates unnecessary Pydantic overhead while maintaining the same output format.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5555 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict

# imports
import pytest
# function to test
from pydantic import BaseModel, Field
from weaviate.collections.classes.types import GeoCoordinate


class _WeaviateInput(BaseModel):
    pass
from weaviate.collections.classes.types import GeoCoordinate

# unit tests

# 1. Basic Test Cases

def test_basic_valid_coordinates():
    # Test with valid latitude and longitude
    geo = GeoCoordinate(latitude=45.0, longitude=90.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 5.89μs -> 1.44μs (310% faster)
    # Test with negative values
    geo = GeoCoordinate(latitude=-45.0, longitude=-90.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.48μs -> 523ns (182% faster)

def test_basic_zero_coordinates():
    # Test with zero values
    geo = GeoCoordinate(latitude=0.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.99μs -> 1.07μs (178% faster)

def test_basic_decimal_coordinates():
    # Test with decimal values
    geo = GeoCoordinate(latitude=12.345, longitude=67.89)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.87μs -> 1.05μs (172% faster)

# 2. Edge Test Cases

def test_edge_latitude_bounds():
    # Test at latitude upper bound
    geo = GeoCoordinate(latitude=90.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.69μs -> 1.02μs (163% faster)
    # Test at latitude lower bound
    geo = GeoCoordinate(latitude=-90.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.18μs -> 462ns (155% faster)

def test_edge_longitude_bounds():
    # Test at longitude upper bound
    geo = GeoCoordinate(latitude=0.0, longitude=180.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.59μs -> 940ns (175% faster)
    # Test at longitude lower bound
    geo = GeoCoordinate(latitude=0.0, longitude=-180.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.05μs -> 460ns (129% faster)

def test_edge_invalid_latitude():
    # Latitude too high
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=90.1, longitude=0.0)
    # Latitude too low
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=-90.1, longitude=0.0)

def test_edge_invalid_longitude():
    # Longitude too high
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=180.1)
    # Longitude too low
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=-180.1)




def test_edge_float_precision():
    # Test with high precision floats
    geo = GeoCoordinate(latitude=12.123456789, longitude=34.987654321)
    codeflash_output = geo._to_dict(); result = codeflash_output # 5.92μs -> 1.31μs (352% faster)

# 3. Large Scale Test Cases

def test_large_scale_many_instances():
    # Test creating and dumping many instances
    coordinates = [
        GeoCoordinate(latitude=float(i % 180 - 90), longitude=float(i % 360 - 180))
        for i in range(1000)
    ]
    dicts = [g._to_dict() for g in coordinates]
    for i, d in enumerate(dicts):
        pass

def test_large_scale_performance(monkeypatch):
    # This test checks that _to_dict is not unnecessarily slow for a large batch
    import time
    coordinates = [GeoCoordinate(latitude=0.0, longitude=0.0) for _ in range(500)]
    start = time.time()
    for geo in coordinates:
        geo._to_dict() # 400μs -> 165μs (142% faster)
    elapsed = time.time() - start

def test_large_scale_unique_values():
    # Test that all unique values are preserved in output
    lats = [float(i % 180 - 90) for i in range(500)]
    lons = [float(i % 360 - 180) for i in range(500)]
    coordinates = [GeoCoordinate(latitude=lat, longitude=lon) for lat, lon in zip(lats, lons)]
    dicts = [geo._to_dict() for geo in coordinates]
    for i, d in enumerate(dicts):
        pass

# 4. Determinism Test

def test_determinism_same_input_same_output():
    # The same input should always give the same output
    geo1 = GeoCoordinate(latitude=23.5, longitude=42.1)
    geo2 = GeoCoordinate(latitude=23.5, longitude=42.1)
    codeflash_output = geo1._to_dict() # 5.90μs -> 1.28μs (360% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, Optional

# imports
import pytest  # used for our unit tests
# function to test
from pydantic import BaseModel, Field
from weaviate.collections.classes.types import GeoCoordinate


# Minimal _WeaviateInput for test purposes
class _WeaviateInput(BaseModel):
    pass
from weaviate.collections.classes.types import GeoCoordinate

# unit tests

# ------------------------
# 1. Basic Test Cases
# ------------------------

def test_basic_positive_coordinates():
    # Test with valid positive latitude and longitude
    coord = GeoCoordinate(latitude=45.0, longitude=90.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.66μs -> 1.08μs (239% faster)

def test_basic_negative_coordinates():
    # Test with valid negative latitude and longitude
    coord = GeoCoordinate(latitude=-45.0, longitude=-90.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.22μs -> 1.08μs (199% faster)

def test_basic_zero_coordinates():
    # Test with zero latitude and longitude
    coord = GeoCoordinate(latitude=0.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.94μs -> 1.08μs (173% faster)

def test_basic_float_precision():
    # Test with float values with precision
    coord = GeoCoordinate(latitude=12.345678, longitude=98.7654321)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.79μs -> 1.06μs (165% faster)

# ------------------------
# 2. Edge Test Cases
# ------------------------

def test_latitude_upper_bound():
    # Test with latitude at upper bound
    coord = GeoCoordinate(latitude=90.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.78μs -> 1.03μs (170% faster)

def test_latitude_lower_bound():
    # Test with latitude at lower bound
    coord = GeoCoordinate(latitude=-90.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.71μs -> 1.02μs (167% faster)

def test_longitude_upper_bound():
    # Test with longitude at upper bound
    coord = GeoCoordinate(latitude=0.0, longitude=180.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.71μs -> 998ns (172% faster)

def test_longitude_lower_bound():
    # Test with longitude at lower bound
    coord = GeoCoordinate(latitude=0.0, longitude=-180.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.78μs -> 996ns (180% faster)

def test_latitude_out_of_bounds_high():
    # Test latitude above upper bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=91.0, longitude=0.0)

def test_latitude_out_of_bounds_low():
    # Test latitude below lower bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=-91.0, longitude=0.0)

def test_longitude_out_of_bounds_high():
    # Test longitude above upper bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=181.0)

def test_longitude_out_of_bounds_low():
    # Test longitude below lower bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=-181.0)

def test_missing_latitude():
    # Test missing latitude should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(longitude=0.0)

def test_missing_longitude():
    # Test missing longitude should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0)

def test_none_latitude():
    # Test latitude set to None should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=None, longitude=0.0)

def test_none_longitude():
    # Test longitude set to None should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=None)

def test_non_float_latitude():
    # Test latitude as string should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude='not_a_float', longitude=0.0)

def test_non_float_longitude():
    # Test longitude as string should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude='not_a_float')

def test_extra_fields_ignored():
    # Test that extra fields are ignored in _to_dict
    class ExtendedGeoCoordinate(GeoCoordinate):
        altitude: float = Field(default=100.0)

    coord = ExtendedGeoCoordinate(latitude=10.0, longitude=20.0, altitude=500.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 5.22μs -> 1.26μs (315% faster)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

def test_many_instances():
    # Test creating many instances and dumping them
    coords = [
        GeoCoordinate(latitude=float(i % 181 - 90), longitude=float(i % 361 - 180))
        for i in range(1000)
    ]
    for i, coord in enumerate(coords):
        codeflash_output = coord._to_dict(); result = codeflash_output # 807μs -> 338μs (139% faster)
        expected_lat = float(i % 181 - 90)
        expected_lon = float(i % 361 - 180)

def test_large_float_precision():
    # Test with very large float values within bounds
    coord = GeoCoordinate(latitude=89.999999999, longitude=179.999999999)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.79μs -> 1.03μs (269% faster)

def test_performance_under_load():
    # Test performance by creating and dumping many objects
    import time
    start = time.time()
    coords = [GeoCoordinate(latitude=0.0, longitude=0.0) for _ in range(1000)]
    dicts = [coord._to_dict() for coord in coords]
    end = time.time()
    # Ensure all dicts are correct
    for d in dicts:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from weaviate.collections.classes.types import GeoCoordinate

def test_GeoCoordinate__to_dict():
    GeoCoordinate._to_dict(GeoCoordinate(latitude=0.0, longitude=0.0))

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-GeoCoordinate._to_dict-mh2ys2xa and push.

Codeflash

The optimization replaces Pydantic's `model_dump(exclude_none=True)` method with a direct dictionary comprehension that filters out `None` values from `self.__dict__`. This achieves a **142% speedup** by eliminating the overhead of Pydantic's serialization machinery.

**Key changes:**
- **Direct attribute access**: `{k: v for k, v in self.__dict__.items() if v is not None}` bypasses Pydantic's model dumping process
- **Simpler filtering**: Uses native Python dictionary comprehension instead of Pydantic's `exclude_none=True` parameter processing

**Why this is faster:**
- Pydantic's `model_dump()` involves field validation, type conversion, and complex serialization logic that adds ~2.5x overhead
- Dictionary comprehension with direct `__dict__` access is a lightweight operation that maps directly to the desired output
- Since `GeoCoordinate` has simple float fields (`latitude` and `longitude`) with Pydantic Field constraints ensuring they're always present and valid, the direct approach is safe

**Performance characteristics:**
The optimization shows consistent 130-360% speedups across all test cases, with particularly strong gains on:
- High-precision floats (352% faster)
- Extended classes with additional fields (315% faster) 
- Basic coordinate operations (155-310% faster)
- Large batch processing (139-142% faster)

This optimization is ideal for high-frequency coordinate processing where the `_to_dict()` method is called repeatedly, as it eliminates unnecessary Pydantic overhead while maintaining the same output format.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 05:11
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant